Skip to content

Comments

feat: Add multi-arch Docker builds and auto-download template inputs#8

Merged
jamesbrink merged 2 commits intomainfrom
feat/aarch64-docker-support
Dec 15, 2025
Merged

feat: Add multi-arch Docker builds and auto-download template inputs#8
jamesbrink merged 2 commits intomainfrom
feat/aarch64-docker-support

Conversation

@jamesbrink
Copy link
Member

Summary

This PR adds two significant enhancements to the project:

  1. Multi-architecture Docker builds - CPU images now build for both amd64 (x86_64) and arm64 (aarch64/Apple Silicon)
  2. Auto-download workflow template input files - Template example images are automatically downloaded on startup

Multi-Architecture Docker Support

  • CPU images: Built for both linux/amd64 and linux/arm64 platforms
  • CUDA images: Remain linux/amd64 only (NVIDIA CUDA requirement)
  • CI Workflow: Uses QEMU emulation via docker/setup-qemu-action for cross-platform builds
  • Manifests: Creates multi-arch manifests so users can docker pull without specifying architecture

This enables Apple Silicon Mac users to run ComfyUI in Docker without Rosetta emulation overhead.

Template Input Files Auto-Download

Workflow templates reference example input images (e.g., example.png, city_snow.png) that aren't bundled with the ComfyUI package. These are hosted on GitHub and need to be downloaded separately.

New scripts/template_inputs.sh:

  • Fetches manifest from Comfy-Org/workflow_templates repository
  • Downloads missing input files to $BASE_DIR/input/
  • Non-blocking: doesn't fail startup if network unavailable
  • Caches manifest for 7 days to minimize network requests
  • Skips existing files for faster subsequent startups

Runtime dependencies:

  • Added curl and jq to PATH via makeWrapper
  • Added jq to Docker images

Changes

File Description
.github/workflows/docker.yml Multi-arch build matrix with QEMU, manifest creation
flake.nix makeWrapper with curl/jq, added jq to Docker images
scripts/template_inputs.sh New script for downloading template inputs
scripts/install.sh Integration of template download during setup
README.md Apple Silicon Docker instructions
CLAUDE.md Updated documentation

Docker Tags After This PR

Tag Architectures GPU Support
latest amd64, arm64 CPU only
latest-cuda amd64 NVIDIA CUDA
X.Y.Z amd64, arm64 CPU only
X.Y.Z-cuda amd64 NVIDIA CUDA

Test Plan

  • nix flake check passes all checks (shellcheck, ruff, pyright, nixfmt)
  • nix build .#packages.x86_64-linux.dockerImage builds successfully (amd64)
  • nix build .#packages.x86_64-linux.dockerImageCuda builds successfully (amd64)
  • nix build .#packages.aarch64-linux.dockerImage builds successfully (arm64)
  • Verified image architectures via manifest.json inspection
  • CI workflow builds and pushes all variants
  • Template input files download correctly on first launch

🤖 Generated with Claude Code

Multi-architecture Docker support:
- Build CPU images for both amd64 and arm64 (Apple Silicon compatible)
- CUDA images remain x86_64 only (NVIDIA requirement)
- Use QEMU emulation in CI for cross-architecture builds
- Create multi-arch manifests for seamless pulling on any platform
- Update README with Apple Silicon Docker instructions

Auto-download workflow template input files:
- Add template_inputs.sh script to download example images on startup
- Fetch manifest from GitHub workflow_templates repository
- Non-blocking download (doesn't fail startup if network unavailable)
- Cache manifest for 7 days to minimize network requests
- Skip existing files for faster subsequent startups
- Add curl and jq to runtime PATH via makeWrapper

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Dec 15, 2025

Code Review - PR #8: Multi-arch Docker builds and auto-download template inputs

Overview

This PR adds significant enhancements: multi-architecture Docker support (amd64 + arm64) and automatic template input file downloads. Overall, the implementation is well-structured and follows the project's conventions. I've identified several areas for improvement below.


🟢 Strengths

1. Architecture & Design

  • Modular approach: New template_inputs.sh follows the established pattern with proper source guards
  • Non-blocking design: Template downloads don't fail startup if network is unavailable
  • Intelligent caching: 7-day manifest cache and file existence checks minimize network overhead
  • Multi-arch strategy: Proper separation of concerns between build and manifest creation jobs

2. Code Quality

  • ✅ Consistent with project's bash style guidelines (function-based, proper error handling)
  • ✅ Proper use of logging functions from logger.sh
  • ✅ Good use of source guards to prevent multiple sourcing
  • ✅ Clear documentation in CLAUDE.md and README.md

3. CI/CD Design

  • ✅ Smart job separation (build + create-manifests) for parallel builds
  • ✅ Proper QEMU setup for cross-platform builds
  • ✅ Architecture-specific intermediate tags (-amd64, -arm64)

🟡 Issues & Recommendations

1. Shellcheck Warning (Minor)

Location: scripts/template_inputs.sh:57

Issue: Variable display_name is extracted from JSON but never used.

display_name=$(echo "$line" | jq -r '.display_name // empty')

Recommendation: Either use it in logging or remove it:

log_debug "Downloading: ${display_name:-$filename}"

2. Error Handling in Template Download (Medium)

Location: scripts/template_inputs.sh:26-30

Issue: Silent failures with 2>/dev/null make debugging difficult. If curl fails, users get a generic warning without understanding why.

if \! curl -fsSL --connect-timeout 5 --max-time 30 "$TEMPLATE_MANIFEST_URL" -o "$manifest_cache.tmp" 2>/dev/null; then
    log_warn "Could not fetch template manifest (network unavailable or timeout)"

Recommendation: Capture and log curl's actual error:

local curl_error
if \! curl_error=$(curl -fsSL --connect-timeout 5 --max-time 30 "$TEMPLATE_MANIFEST_URL" -o "$manifest_cache.tmp" 2>&1); then
    log_warn "Could not fetch template manifest: ${curl_error}"
    log_debug "URL: $TEMPLATE_MANIFEST_URL"
    return 0
fi

3. Race Condition in Manifest Cache (Medium)

Location: scripts/template_inputs.sh:31

Issue: Direct mv without atomic write could corrupt the cache if the script is interrupted.

mv "$manifest_cache.tmp" "$manifest_cache"

Recommendation: Add error checking:

if [[ -s "$manifest_cache.tmp" ]]; then
    mv "$manifest_cache.tmp" "$manifest_cache"
else
    log_warn "Downloaded manifest is empty, keeping previous cache"
    rm -f "$manifest_cache.tmp"
    return 0
fi

4. jq Error Handling (Medium)

Location: scripts/template_inputs.sh:42, 92

Issue: jq failures are silently ignored, potentially causing infinite loops or incorrect counts.

total_files=$(jq -r '.assets | length' "$manifest_cache" 2>/dev/null || echo "0")

Recommendation: Validate JSON structure explicitly:

if \! jq -e '.assets' "$manifest_cache" &>/dev/null; then
    log_warn "Invalid manifest format, removing cache"
    rm -f "$manifest_cache"
    return 0
fi
total_files=$(jq -r '.assets | length' "$manifest_cache")

5. File Download Safety (Medium-High)

Location: scripts/template_inputs.sh:78-81

Issue: No validation of downloaded content. A compromised GitHub account or MITM attack could inject malicious files.

Current code:

if curl -fsSL --connect-timeout 5 --max-time 60 "$url" -o "$local_path.tmp" 2>/dev/null; then
    mv "$local_path.tmp" "$local_path"

Recommendations:

  1. Verify file size is reasonable (prevent disk exhaustion):

    local max_file_size=$((50 * 1024 * 1024))  # 50MB limit for template images
    local file_size
    file_size=$(stat -f%z "$local_path.tmp" 2>/dev/null || stat -c%s "$local_path.tmp" 2>/dev/null)
    if [[ $file_size -gt $max_file_size ]]; then
        log_warn "File too large ($file_size bytes), skipping: $filename"
        rm -f "$local_path.tmp"
        ((fail_count++))
        continue
    fi
  2. Validate file is actually an image (basic magic byte check):

    # Check file magic bytes for common image formats
    local file_type
    file_type=$(file -b --mime-type "$local_path.tmp" 2>/dev/null || echo "unknown")
    if [[ \! $file_type =~ ^image/ ]]; then
        log_warn "Downloaded file is not an image ($file_type): $filename"
        rm -f "$local_path.tmp"
        ((fail_count++))
        continue
    fi
  3. Add checksums to manifest (ideal long-term solution): Comfy-Org should publish SHA256 hashes alongside the manifest.


6. Docker Workflow - Manifest Creation Edge Cases (Medium)

Location: .github/workflows/docker.yml:187-199

Issue: Manifest creation assumes both amd64 and arm64 images were successfully pushed. If one architecture fails silently, docker manifest create will fail.

Recommendation: Add existence checks before creating manifests:

# Verify both architecture images exist before creating manifest
for arch_tag in "${REGISTRY}/${IMAGE_NAME}:latest-amd64" "${REGISTRY}/${IMAGE_NAME}:latest-arm64"; do
    if \! docker manifest inspect "$arch_tag" &>/dev/null; then
        echo "ERROR: Required image not found: $arch_tag"
        exit 1
    fi
done

docker manifest create "$MANIFEST_TAG" \
    "${REGISTRY}/${IMAGE_NAME}:latest-amd64" \
    "${REGISTRY}/${IMAGE_NAME}:latest-arm64"

7. Nix Build Architecture Platform Setting (Low)

Location: .github/workflows/docker.yml:81

Issue: extra-platforms = aarch64-linux is set globally, but Ubuntu runners are x86_64 and rely on QEMU emulation. This setting may not be necessary since Nix will use the emulated platform anyway.

Recommendation: Test if this line is actually needed. If builds work without it, remove it for clarity. The QEMU action should handle platform emulation transparently.


8. Platform Detection in Template Script (Low)

Location: scripts/template_inputs.sh:119

Issue: Uses stat command which has different syntax on macOS vs Linux. The file size validation I suggested above needs platform detection.

Recommendation: Use portable stat wrapper (add to beginning of file):

# Platform-agnostic file size function
get_file_size() {
    local file=$1
    if stat -f%z "$file" &>/dev/null 2>&1; then
        # macOS/BSD
        stat -f%z "$file"
    else
        # Linux/GNU
        stat -c%s "$file"
    fi
}

9. Missing Integration Test Checklist Item

Location: PR description test plan

Issue: Two test items are unchecked:

  • CI workflow builds and pushes all variants
  • Template input files download correctly on first launch

Recommendation: These should be tested before merge, especially the manifest creation logic which is complex and error-prone.


🔒 Security Considerations

Good Security Practices

  1. Timeouts on network operations prevent hanging
  2. Non-blocking downloads don't compromise startup security
  3. Files written to $input_dir respect BASE_DIR validation from config.sh

⚠️ Security Concerns

  1. No integrity validation on downloaded files (see issue feat: Complete Docker Support with CI/CD and Public Registry #5 above)
  2. Trust in GitHub's CDN: Files are fetched from raw.githubusercontent.com without verification
  3. Potential for path traversal: While unlikely with GitHub's manifest, validate file_path doesn't contain ../:
    # Add after line 57
    if [[ "$file_path" =~ \.\. ]]; then
        log_warn "Skipping potentially malicious file_path: $file_path"
        continue
    fi

🧪 Test Coverage

Missing tests:

  • No automated tests for template_inputs.sh functions
  • No integration test for multi-arch Docker builds
  • No validation that arm64 images actually work on Apple Silicon

Recommendations:

  1. Add shellcheck to CI for new script (it's already in nix run .#lint)
  2. Consider adding a basic integration test that mocks the GitHub API
  3. Test actual arm64 image on Apple Silicon hardware before declaring support

📊 Performance Considerations

Good Performance Choices

  1. Parallel builds in CI (3 builds simultaneously)
  2. Manifest caching prevents repeated downloads (7-day TTL)
  3. File existence checks skip re-downloads
  4. Quick heuristic check (needs_template_inputs) avoids slow operations

💡 Optimization Opportunities

  1. Parallel downloads: Currently downloads files sequentially. For 20+ files, consider parallel downloads:

    # Use xargs for parallel downloads (max 5 concurrent)
    jq -c '.assets[]' "$manifest_cache" | \
        xargs -P 5 -I {} bash -c 'download_single_file "$@"' _ {}
  2. Progress reporting: For large download sets, current progress only shows every 10 files. Consider more frequent updates or a progress bar.


📝 Documentation

Well Documented

  • README.md has clear Apple Silicon instructions
  • CLAUDE.md updated with multi-arch details
  • PR description is comprehensive and well-structured

📌 Minor Suggestions

  1. Add comment in template_inputs.sh explaining the 7-day cache TTL choice
  2. Document the manifest JSON schema format (helps future maintainers)
  3. Add troubleshooting section to README for template download failures

🎯 Overall Assessment

Quality: 8/10

  • Well-designed solution with good architectural decisions
  • Follows project conventions consistently
  • Addresses a real need (Apple Silicon support + template files)

Readiness: 7/10

  • Core functionality is solid
  • Needs fixes for error handling and security validation
  • Missing integration test verification

Impact: High

  • Enables Apple Silicon users to run ComfyUI natively
  • Improves UX with automatic template downloads
  • Demonstrates good CI/CD practices

Recommendations for Merge

Must Fix (before merge):

  1. Fix shellcheck warning (remove unused display_name variable)
  2. Add file size validation to prevent DoS via large files
  3. Add path traversal validation for file_path
  4. Validate manifest JSON structure before processing
  5. Test CI workflow actually works (complete unchecked test items)

Should Fix (before or soon after merge):

  1. Improve error messages with actual curl errors
  2. Add atomic write pattern for manifest cache
  3. Add Docker manifest existence checks
  4. Consider adding file type validation

Nice to Have (future improvements):

  1. Add SHA256 verification (requires upstream manifest changes)
  2. Implement parallel downloads
  3. Add integration tests
  4. Better progress reporting for downloads

🤖 Code Review Summary

This is a well-executed PR that adds meaningful functionality. The multi-arch Docker support is implemented correctly with proper job separation and manifest creation. The template download script follows project conventions and includes smart optimizations.

The main concerns are around error handling robustness and file download security. These are not showstoppers but should be addressed to ensure production readiness.

Recommendation: ✅ Approve with requested changes - Fix the must-fix items above, verify CI tests pass, then merge.

Security improvements in template_inputs.sh:
- Add path traversal validation to reject file paths containing ".."
- Add file size validation (50MB limit) to prevent DoS attacks
- Add file type validation using MIME type checking
- Validate JSON manifest structure before processing

Robustness improvements:
- Capture and display actual curl error messages for debugging
- Add atomic write pattern with empty file validation
- Platform-agnostic file size detection (macOS/Linux compatible)
- Remove unused display_name variable (shellcheck warning)

CI/CD improvements in docker.yml:
- Add manifest existence verification before creating multi-arch manifests
- Use variables for architecture-specific tags for consistency
- Add clear error messages when builds fail

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Dec 15, 2025

Code Review: PR #8 - Multi-arch Docker builds and auto-download template inputs

Overall Assessment

This is a well-implemented PR that adds valuable functionality. The code demonstrates strong attention to security, robustness, and error handling. I've identified several areas for consideration below.

✅ Strengths

Security Considerations (Excellent)

  • Path traversal protection: Line 98 in template_inputs.sh checks for .. in file paths
  • File size validation: 50MB limit prevents DoS attacks (line 123)
  • MIME type validation: Validates file types when file command is available (lines 138-149)
  • JSON structure validation: Validates manifest structure before processing (line 65)
  • Atomic writes: Uses .tmp files and mv for atomic operations
  • Non-blocking failures: All network errors gracefully degrade without blocking startup

Code Quality

  • Modular design: Follows project's modular script architecture
  • Cross-platform support: Platform-agnostic functions for macOS/Linux (lines 15-24, 195-199)
  • Error handling: Comprehensive error capture with informative messages
  • Shellcheck clean: Script passes shellcheck validation
  • Documentation: Clear comments and function documentation

CI/CD Improvements

  • Multi-arch builds: Properly uses QEMU for ARM64 emulation
  • Fail-fast disabled: Allows all matrix builds to complete even if one fails
  • Manifest verification: Checks image existence before creating manifests (lines 199-207)
  • Clear error messages: Provides actionable error information

🔍 Issues & Recommendations

Critical Issues

1. Missing error handling in Docker workflow manifest creation

In .github/workflows/docker.yml lines 183-227, the CUDA manifest creation logic has potential issues:

# Line 235-237: This pulls and retags, but doesn't verify the pull succeeded
docker pull "$CUDA_AMD64_TAG"
docker tag "$CUDA_AMD64_TAG" "$CUDA_TAG"
docker push "$CUDA_TAG"

Recommendation: Add error checking after the pull:

if ! docker pull "$CUDA_AMD64_TAG"; then
  echo "ERROR: Failed to pull CUDA image"
  exit 1
fi

Security Concerns

2. Path traversal check is incomplete (Medium severity)

template_inputs.sh line 98 only checks for literal ..:

if [[ "$file_path" =~ \.\. ]]; then

This won't catch encoded variations like %2e%2e, symlinks, or absolute paths.

Recommendation: Use a more robust check:

# Get the absolute canonical path
local canonical_path
canonical_path=$(readlink -f "$input_dir/$filename" 2>/dev/null || echo "")
# Check if it's within the input directory
if [[ ! "$canonical_path" =~ ^"$input_dir"/ ]]; then
    log_warn "File path escapes input directory: $file_path"
    ((fail_count++))
    continue
fi

3. URL validation missing

template_inputs.sh line 90 extracts URLs from the manifest but doesn't validate them.

Recommendation: Add URL scheme validation:

# Validate URL uses HTTPS
if [[ ! "$url" =~ ^https:// ]]; then
    log_warn "Skipping non-HTTPS URL: $url"
    ((fail_count++))
    continue
fi

Performance Considerations

4. Sequential file downloads

Lines 87-164 in template_inputs.sh download files sequentially. With many files, this could be slow.

Recommendation: Consider parallel downloads with a concurrency limit:

# Use background jobs with a limit
max_parallel=5
while read -r line; do
    # ... download logic ...
    download_file "$url" "$local_path" &
    # Wait if we hit max parallel
    if ((++job_count % max_parallel == 0)); then
        wait
    fi
done
wait  # Wait for remaining jobs

5. Inefficient file counting in needs_template_inputs()

Line 208 uses find | wc -l which can be slow:

file_count=$(find "$input_dir" -maxdepth 1 -type f ! -name ".*" 2>/dev/null | wc -l)

Recommendation: Use a more efficient approach:

# Count files directly with shell globbing
shopt -s nullglob
files=("$input_dir"/*)
file_count=${#files[@]}

Code Quality

6. Unused variable i in template_inputs.sh

Line 163 increments i but it's never used:

((i++))

Recommendation: Remove the unused variable.

7. Hard-coded magic number

Line 209 has magic number 20:

if ((file_count < 20)); then

Recommendation: Define as a constant:

MIN_EXPECTED_TEMPLATE_FILES=20

8. Missing set -euo pipefail in template_inputs.sh

Unlike other scripts in the project, template_inputs.sh doesn't use strict mode. While it's sourced (not executed), adding these would catch more errors during development.

Recommendation: Consider adding after the guard clause:

set -euo pipefail

Testing Gaps

9. No tests for template_inputs.sh

The PR description mentions testing the builds but doesn't mention testing the template download functionality.

Recommendation: Add to test plan:

  • Test with network unavailable
  • Test with malformed manifest
  • Test with large files (>50MB)
  • Test with various file types
  • Test manifest caching (7-day expiry)

10. Multi-arch builds not fully tested

Test plan shows checkboxes for building but:

- [ ] CI workflow builds and pushes all variants
- [ ] Template input files download correctly on first launch

These critical items are unchecked.

Recommendation: Complete testing before merge, especially:

  • Verify ARM64 images actually work on Apple Silicon
  • Verify multi-arch manifest pulls correct architecture automatically

Documentation

11. Missing failure scenarios in README

The README updates don't mention what happens when template downloads fail.

Recommendation: Add a troubleshooting section:

### Troubleshooting Template Downloads
If workflow templates fail to load, template input files may be missing. To manually download:
```bash
# Check logs for download errors
$BASE_DIR/input/.template_manifest.json

Minor Issues

12. Inconsistent error message format

Some error messages end with periods, others don't. Be consistent.

13. Docker workflow could benefit from caching

The multi-arch builds rebuild everything. Consider using Docker layer caching with docker/build-push-action which has better caching support than docker load.

📊 Performance Impact

  • Startup time: Added 5-30 seconds on first run (network-dependent)
  • Subsequent starts: <1 second (skips existing files)
  • Network usage: ~100-500MB for template files (one-time)
  • Disk usage: ~100-500MB in input directory
  • CI time: Significant increase due to QEMU emulation for ARM64 (~2-3x longer)

🔒 Security Summary

Issue Severity Impact
Incomplete path traversal check Medium Could allow directory traversal in edge cases
Missing URL validation Low Could download from non-HTTPS sources
No checksum validation Low Cannot verify file integrity
File type validation is optional Low Depends on file command availability

📝 Recommendations Summary

Must Fix Before Merge

  1. Add error handling for Docker manifest creation
  2. Improve path traversal validation
  3. Add URL scheme validation (HTTPS only)
  4. Complete testing of multi-arch builds and template downloads

Should Fix

  1. Remove unused variable i
  2. Add constants for magic numbers
  3. Add checksum validation for downloaded files
  4. Document failure scenarios

Nice to Have

  1. Implement parallel downloads
  2. Optimize file counting
  3. Add comprehensive tests for template_inputs.sh
  4. Consider Docker layer caching for CI

🎯 Conclusion

This PR adds significant value by enabling Apple Silicon support and auto-downloading template files. The code quality is generally high with good security practices. However, I recommend addressing the critical security issues and completing the testing before merge.

Recommendation: Request changes for items 1-4, then approve after fixes.


Review conducted by Claude Code - focus areas: security, performance, code quality, CI/CD

@jamesbrink jamesbrink merged commit e5971a4 into main Dec 15, 2025
6 checks passed
jamesbrink added a commit that referenced this pull request Dec 15, 2025
Security improvements:
- Add error handling for docker pull in CUDA manifest creation
- Improve path traversal validation with canonical path check
- Add HTTPS URL scheme validation for downloads
- Check for absolute paths and .. in filenames

Code quality:
- Remove unused variable 'i' in download loop
- Add MIN_EXPECTED_TEMPLATE_FILES constant (was magic number 20)

These changes address the "Must Fix" and "Should Fix" items from
the code review on PR #8.
jamesbrink added a commit that referenced this pull request Dec 15, 2025
* fix: Address code review security and robustness concerns

Security improvements:
- Add error handling for docker pull in CUDA manifest creation
- Improve path traversal validation with canonical path check
- Add HTTPS URL scheme validation for downloads
- Check for absolute paths and .. in filenames

Code quality:
- Remove unused variable 'i' in download loop
- Add MIN_EXPECTED_TEMPLATE_FILES constant (was magic number 20)

These changes address the "Must Fix" and "Should Fix" items from
the code review on PR #8.

* fix: Address follow-up code review feedback

- Remove realpath dependency for cross-platform compatibility (macOS)
- Simplify path validation condition (remove redundant regex check)
- Add error handling for docker tag and push operations
@jamesbrink jamesbrink deleted the feat/aarch64-docker-support branch February 20, 2026 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant